SWAGOLX.EXE (c) 1993 GDSOFT ALL RIGHTS RESERVED 00006 PARSING/TOKENIZING ROUTINES 1 05-28-9313:54ALL SWAG SUPPORT TEAM PARSENUM.PAS IMPORT 55 Typeπ RW_toKEN = Recordπ token_str :String[9];π token_cod :toKEN_CODE;π end;ππ RW_Type = Array[0..9] of RW_toKEN;π RWT_PTR = ^RW_Type;ππConstπ NULL = '';ππ Rw_2 :RW_Type = ((token_str : 'do'; token_cod : tdo),π (token_str : 'if'; token_cod : tif),π (token_str : 'in'; token_cod : tin),π (token_str : 'of'; token_cod : tof),π (token_str : 'or'; token_cod : tor),π (token_str : 'to'; token_cod : tto),π (token_str : NULL; token_cod : NO_toKEN),π (token_str : NULL; token_cod : NO_toKEN),π (token_str : NULL; token_cod : NO_toKEN),π (token_str : NULL; token_cod : NO_toKEN)π );ππ ...the difference being the explicit declaration of the Constantπ Record fields. (I'm used to Array Constants, not Recordπ Constants - I was unaware of the requirement)ππ PARSinG NUMBERSππ Now we'll concentrate on parsing Integer and Real numbers.ππ The Pascal definition of a number begins With an UNSIGNEDπ Integer. An unsigned Integer consists of one or more consecutiveπ DIGITS. The simplest Form of a number token is an unsignedπ Integer:ππ 1 9 120 12654ππ A number token can also be an unsigned Integer (the whole part)π followed by a fraction part. A fraction part consists of aπ decimal point followed by an unsigned Integer, such as:ππ 123.45 0.9987564ππ These numbers have whole parts 123 and 0 respectively, andπ fraction parts .45 and .9987564 respectively.ππ A number token can also be a whole part followed by an EXPONENTπ part. An exponent part consists of an "E" (or "e") followed byπ an unsigned Integer. An optional exponent sign, + or -, canπ appear between the letter and the first exponent digit.π Examples:ππ 134e2 2E99 123e-45 73623E+4ππ Finally, a number token can be a whole part followed by aπ fraction part and an exponent part, in that order:ππ 2.3498E7 0.00034e-66ππ I arbitrarily limit the number of digits to 20, and the exponentπ value from -37 to +37 - the exact value necessary to limit thisπ value is dependant on how Real values are represented on theπ Computer.ππ The "get_number" Function is likely to be the biggest Functionπ in your scanner, but it should be relatively straighForward toπ code...in light of what has already been done With the scanner/π tokenizer module, and the definition of a number.ππ EXERCISE #1ππ Write the get_number Function to parse Integers and Realπ numbers.ππ You will need to add the following Types and Variables to yourπ global data segment:ππ Type { add "Real"s to list... }ππ LITERAL_Type = (Integer_LIT, Real_LIT, String_LIT);ππ LITERAL_REC = Recordπ Case lType:LITERAL_Type ofπ Integer_LIT: (ivalue :Integer);π Real_LIT : (rvalue :Real );π String_LIT : (svalue :String );π end;ππ Varππ digit_count :Word;π count_error :Boolean;ππ-------------- PART 2 ---------------------------------------ππ The rest of this post will cover two simple topics - parsingπ Strings inside quotes, and parsing comments.ππ PARSinG COMMENTS {}ππ The Compiler should ignore the input between two curly bracesπ ({}), and the curly braces themselves. My scanner is written soπ the entire comment is replace by a Single blank (" "), althoughπ you could possibly Write the scanner so that comments areπ _totally_ ignored.ππ EXERCISE #2:ππ Integrate COMMENT detection into the get_Char routine, so thatπ when your Character fetching routine will ignore comments andπ pass a blank when a comment is encountered, skipping the commentπ entirely For the next fetch.ππ Make sure that the routine keeps reading Until the right curlyπ brace is detected, even past the end-of-line. if the end-of-Fileπ is encountered beFore the right curly brace is found, anπ "unexpected end" error should be generated.ππ PARSinG StringS (QUOTES) ''ππ The quote Character delimits Strings, any Character between theπ Strings is ignored by the Compiler, except to stored as a Stringπ LITERAL. if you wish a ' (quote) to be included in the literal,π and extra ' must precede it.ππ One possible tricky area is the {} (comment) Character. You mustπ be careful not to inadvertently trigger the comment routine withinπ the quote routine While reading a String, otherwise you willπ have a BUG.ππ EXERCISE #3:ππ Add a quote routine to the get_token routine within your module,π to fetch Strings, as a LITERAL IDENTifIER when the QUOTEπ Character is detected.ππ The following mods to your Types are required:ππ Eof_Char = #$7F;ππTypeπ Char_CODE = (LETTER, DIGIT, QUOTE, SPECIAL, Eof_CODE);ππ { The following code init's the Character maping table: }ππVarπ ch :Byte;πbeginπ For ch := 0 to 255 doπ Char_table[ch] := SPECIAL;π For ch := ord('0') to ord('9') doπ Char_table[ch] := DIGIT;ππ For ch := ord('A') to ord('Z') doπ Char_table[ch] := LETTER;π For ch := ord('a') to ord('z') doπ Char_table[ch] := LETTER;ππ Char_table[ord(Eof_Char)] := Eof_CODE;ππ Char_table[39] := QUOTE;πend;ππ ----------------------------------------------------------------ππ PLEASE, please let me know what you think about these posts,π even if they're negative - I want to have some feedback on theπ difficulties, and whether or not people are having troubleπ following the material - I _can_ be more concise at the cost ofπ being more verbose - if it's needed!ππ if you are having problems With your source code, and want me toπ do a detailed examination of your code, expecially if it'sπ written in a language other than Pascal, send me email via theπ Internet - to avoid "carpet bombing" the conference withπ undesired material.πππ NEXT POST:ππ Error codes, and putting your code to the test - our firstπ utility (other than the lister) : a source Program Compactorπ (not cruncher).ππ FUTURE POSTS:ππ - Review and (hopefully) a status report from "students"π - Symbol tableπ - YA utility (cross - referencer)π - YA utility (source Program CRUNCHer)π - YA utility (source Program UNcruncher)π - Parsing simple expressionsπ - Utility : CALC, using infix-to-postfix conversions and stackπ ops.π - Parsing statementsπ - Utility: Pascal syntax checker part Iπ - Parsing declarations (Var, Type, etc)π incl's: much improved (and much more Complex) symbol tableπ - Utility: Declarations analyzer.π - Syntax Checker part IIπ - Parsing Program, Procedure, and Function declarationsπ (routines).π - Syntax checker Part IIIππ - Review and discussion?π 2 05-28-9313:54ALL SWAG SUPPORT TEAM PARSEWRD.PAS IMPORT 33 Program PARSER;ππ{The Object of this Program is to accept a sentence from the user then to break theπ sentence into its Component Words and to display each Word on a separate line.π}ππUses Crt; {Required by Turbo Pascal}ππConstπ maxWord = 15;π maxsentence = 15;π space = CHR(32);π first = 1;ππTypeπ Strng = Array[1..maxWord] of Char;π Word = Recordπ body : Strng;π length : Integerπ end;ππVarπ sentence : Array[1..maxsentence] of Word;π row, col, nextcol, count : Integer;π demarker : Boolean;π ans : Char;ππProcedure SpaceTrap;π{ Insures that there is ony 1 space between Words }πbeginπ Repeatπ READ(sentence[row].body[first])π Until sentence[row].body[first] <> spaceπend;ππProcedure StringWrite(Var phrase : Word);π{Writes only the required length of each Character String.πThis is required when using 32 col. mode.}πVarπ letter : Integer;πbeginπ For letter := first to phrase.length doπ Write(phrase.body[letter])π end; {Procedure StringWrite}ππ Procedure StringRead;π Var I : Integer;π beginπ {π Intitialize the Variablesπ }π count := 1;π row := first;π col := first;π nextcol := col + 1;π demarker := False;π For I := first to maxsentence doπ sentence[I].length := 1;π Write('Type a sentence > ');π {READLN;} {Clears the buffer of EOLN}π {Required by HiSoft Pascal}π While (not EOLN) and (row < maxsentence) doπ beginπ READ(sentence[row].body[col]);π if sentence[row].body[first] = space then SpaceTrap;π if sentence[row].body[col] = space thenπ demarker := True;π if (not demarker) and (nextcol < maxWord) thenπ beginπ col := col + 1;π nextcol := nextcol + 1π endπ elseπ beginπ sentence[row].length := col;π count := count + 1;π row := row + 1;π col := first;π nextcol := col + 1;π demarker := Falseπ end; {if...then...else}π if EOLN then sentence[row].length := col - 1π {Accounts For the last Word entered less the EOLN marker.}π end {While loop}π end; {Procedure StringRead}ππ Procedure PrintItOut;π Varπ subsequent : Integer;π beginπ subsequent := first + 1;π Write('Parsing > ');π StringWrite(sentence[first]);π WriteLN;π if count >= subsequent thenπ beginπ For row := subsequent to count doπ beginπ Write(' ');π StringWrite(sentence[row]);π WriteLNπ endπ endπ end; {Procedure PrintItOut}ππ Procedure SongandDance;π beginπ {PAGE;} {HiSoft Pascal = Turbo Pascal ClrScr}π ClrScr;π WriteLN(' Parser');π WriteLN;π WriteLN(' Program By David Solly');π WriteLN;π WriteLN(' The Object of this Program');π WriteLN('is to accept a sentence from');π WriteLN('the user then to break the');π WriteLN('sentence down into its');π WriteLN('Component Words and to display');π WriteLN('each Word on a seperate line.');π WriteLN;π WriteLN;π end; {Procedure SongandDance}ππ begin {Main Program}π SongandDance;π StringRead;π WriteLN;π PrintItOut;π WriteLN;π WriteLN('end of Demonstration.');π READLN(ans);π end. {Main Program}π 3 08-17-9308:50ALL RYAN THOMPSON Command Line Parsing IMPORT 37 D ===========================================================================π BBS: Canada Remote SystemsπDate: 08-10-93 (01:00) Number: 33744πFrom: RYAN THOMPSON Refer#: NONEπ To: TERRY GRANT @ 912/701 Recvd: NO πSubj: RE: COMMAND LINE PARSING Conf: (1221) F-PASCALπ---------------------------------------------------------------------------π>>> Quoting message from Terry Grant @ 912/701 to Allπ>>> Original sent 07 Aug 93 20:36:00 about Command Line ParsingππTG> Hello All!πTG>πTG> After working on this for awhile, I thought mabe someone else could helpπTG> me out a little here. All I need this to do is Parse the command line forπTG> seven parameters,πTG>πTG> The BaudRate (/B),πTG> :πTG> and Overlay Size (/O).πTG>πTG> My Main problem here is, it will SEE the command line, But WILL NOT allowπTG> me to use anything AFTER the Switch ? Like /B2400 !ππ Sure thing! I once wrote a unit which among other things has some neatπparsing for the command line. Here's a snippet:ππ{- Top -}ππ Function SwitchNum(S : String) : Integer;π { If a switch character specified exists, return which position }π { it is in on the command line. Used internally. }π Varπ Temp : String;π X,π Y : Integer;π Beginπ Temp:= '';π X:= ParamCount;π Y:= 0;π while (X > 0) and (Y = 0) do beginπ Temp:= ParamStr(X);π if (Temp[1] = '/') or (Temp[1] = '-') thenπ if UpCase(Temp[2]) = UpString(S) then Y:= X;π Dec(X);π end;π SwitchNum:= Y;π End;πππ Function SwitchThere(S : String) : Boolean;π { Returns TRUE if a switch of the character specified exists. }π Beginπ If SwitchNum(S) = 0 then SwitchThere:= Falseπ else SwitchThere:= True;π End;πππ Function SwitchData(S : String) : String;π { Return the data following a switch: /B2400 returns 2400. }π Varπ Temp : String;π Beginπ If SwitchNum(S) > 0 then beginπ Temp:= ParamStr(SwitchNum(S));π Delete(Temp, 1, 2);π endπ else Temp:= '';π SwitchData:= Temp;π End;πππ Function Parameter(N : Byte) : String;π { Returns the Nth command line parameter. Parameters in quotes }π { are returned with the spaces in between: /D Test "One Two" }π { Returns >Test< for Parameter(1) and >"One Two< for Parameter(2) }π { This allows you to, if you like, see what type of quote was }π { used, for perhaps literal vs. translate to ALL CAPS. }π Varπ X,π Count : Byte;π Parm,π Temp : String;π Beginπ X:= 0;π Count:= 0;π Parm:= '';π If ParamCount > 0 then repeatπ Inc(X);π Temp:= ParamStr(X);π If (Temp[1] = '"') or (Temp[1] = '''') then beginπ Parm:= Temp;π If X < ParamCount then repeatπ Inc(X);π Parm:= Parm + ' ' + ParamStr(X);π until (Parm[Length(Parm)] = '"') orπ (Parm[Length(Parm)] = '''') or (X = ParamCount);π Inc(Count);π endπ else if (Temp[1] <> '/') and (Temp[1] <> '-')π then beginπ Inc(Count);π Parm:= Temp;π end;π until (X = ParamCount) or (Count = N);π If Count = N then Parameter:= Parmπ else Parameter:= '';π End;πππ Function Parameters : Byte;π { Return the number of non-switch parameters on the command line. }π Varπ X : Byte;π Beginπ X:= 0;π If ParamCount > 0 then beginπ Repeatπ Inc(X)π Until Parameter(X) = '';π Parameters:= X - 1;π endπ else Parameters:= 0;π End;ππ{- Fin -}ππ A few examples:ππ If SwitchThere('?') then DisplayHelp;π If SwitchThere('B') then BaudString:= SwitchData('B');π If Parameters < 1 then begin WriteLn('Too few parms'); Halt; end;π For X:= 1 to Parameters doπ beginπ Param[X]:= Parameter(X);π end;ππ Sample command lines:ππ TESTPROG /D /F TEST /B2400 "This is a test" /M-ππ Parameters returns 2,π Parameter(1) returns TESTπ Parameter(2) returns "This is a testπ SwitchThere('L') returns Falseπ SwitchData('M') returns -π SwitchData('G') returns null.ππ I hope this helps you out! It could be optimized a lot by simply readingπall of the parameters into an array in your initialization code, to eliminateπall of the redundant parsing, but I don't think that parsing time for a fewπhundred characters at most is a limiting factor of any sort. ;-)ππbyeπRyanππ--- Renegade v07-17 Betaπ 4 09-26-9309:12ALL MARTIN RICHARDSON Check for CmdLine switch SWAG9311 7 D {*****************************************************************************π * Function ...... IsSwitch()π * Purpose ....... To test for the presence of a switch on the command lineπ * Parameters .... sSwitch Switch to scan the command line forπ * Returns ....... .T. if the switch was foundπ * Notes ......... Uses functions Command and UpperCaseπ * Author ........ Martin Richardsonπ * Date .......... September 28, 1992π *****************************************************************************}πFUNCTION IsSwitch( sSwitch: STRING ): BOOLEAN;πBEGINπ IsSwitch := (POS( '/'+sSwitch, UpperCase(Command) ) <> 0) ORπ (POS( '-'+sSwitch, UpperCase(Command) ) <> 0);πEND;π 5 09-26-9309:22ALL MARTIN RICHARDSON Parse out tokens SWAG9311 16 D {*****************************************************************************π * Function ...... ParseCount()π * Purpose ....... To count the number of tokens in a stringπ * Parameters .... cString String to count tokens inπ * cChar Token separatorπ * Returns ....... Number of tokens in <cString>π * Notes ......... Uses function StripCharπ * Author ........ Martin Richardsonπ * Date .......... September 30, 1992π *****************************************************************************}πFUNCTION ParseCount( cString: STRING; cChar: CHAR ): INTEGER;πBEGINπ ParseCount := LENGTH(cString) - LENGTH(StripChar(cString, cChar)) + 1;πEND;ππ{*****************************************************************************π * Function ...... Parse()π * Purpose ....... To parse out tokens from a stringπ * Parameters .... cString String to parseπ * nIndex Token number to returnπ * cChar Token separatorπ * Returns ....... Token <nIndex> extracted from <cString>π * Notes ......... If <nIndex> is greater than the number of tokens inπ * <cString> then a null string is returned.π * . Uses function Left, Right, and ParseCountπ * Author ........ Martin Richardsonπ * Date .......... September 30, 1992π *****************************************************************************}πFUNCTION Parse( cString: STRING; nIndex: INTEGER; cChar: CHAR ): STRING;πVAR π i: INTEGER;π cResult: STRING;πBEGINπ IF nIndex > ParseCount( cString, cChar ) THENπ cResult := ''π ELSE BEGINπ cString := cString + cChar;π FOR i := 1 TO nIndex DO BEGINπ cResult := Left( cString, POS( cChar, cString ) - 1 );π cString := Right(cString, LENGTH(cString) - POS(cChar, cString));π END { Next I };π END { IF };π Parse := cResult;πEND;ππ 6 10-28-9311:35ALL RYAN THOMPSON Command Line Parsing SWAG9311 31 D {===========================================================================π BBS: Canada Remote SystemsπFrom: RYAN THOMPSONπSubj: RE: COMMAND LINE PARSINGππ>>> Quoting from Chet Kress to Frans Van Duinen about Command Line ParsingππCK> FVD>I want to pass to my BP 7 program a few parameters, one of whichπCK> FVD>has embedded (or even trailing) blanks. The naive approach ofπCK> FVD>PROCFAX PROCFAX.CFG \PCB\MAIN\MSGS58 "FAX MAIL" does not work.πCK> FVD>Currently I pick up FAX and MAIL as two parameters andπCK> FVD>string, but I want to allow multiple embedded/trailing blanks.ππ Here's a set of routines to do just what you want.ππ Parameters Returns the number of parameters on the command line. Doesπ not include switches.π Parameter(n) Returns the nth parameter, ignoring switches and passingπ strings in quotes as " or ' followed by the entire stringπ including any imbedded spaces.π SwitchThere(x) Returns True if the switch specified by the characterπ passed is present on the command line.π SwitchData(x) Returns the data following the switch character if theπ switch character specified is present on the command line.π SwitchNum(x) Returns the position on the command line of the switchπ specified. Skips parameters. }πππ Function SwitchNum(S : String) : Integer;π Varπ Temp : String;π X,π Y : Integer;π Beginπ Temp:= '';π X:= ParamCount;π Y:= 0;π while (X > 0) and (Y = 0) do beginπ Temp:= ParamStr(X);π if (Temp[1] = '/') or (Temp[1] = '-') thenπ if UpCase(Temp[2]) = UpString(S) then Y:= X;π Dec(X);π end;π SwitchNum:= Y;π End;πππ Function SwitchThere(S : String) : Boolean;π Beginπ SwitchThere:= not (SwitchNum(S) = 0);π End;πππ Function SwitchData(S : String) : String;π Varπ Temp : String;π Beginπ If SwitchNum(S) > 0 then beginπ Temp:= ParamStr(SwitchNum(S));π Delete(Temp, 1, 2);π endπ else Temp:= '';π SwitchData:= Temp;π End;πππ Function Parameter(N : Byte) : String;π Varπ X,π Count : Byte;π Parm,π Temp : String;π Beginπ X:= 0;π Count:= 0;π Parm:= '';π If ParamCount > 0 then repeatπ Inc(X);π Temp:= ParamStr(X);π If (Temp[1] = '"') or (Temp[1] = '''') then beginπ Parm:= Temp;π If X < ParamCount then repeatπ Inc(X);π Parm:= Parm + ' ' + ParamStr(X);π until (Parm[Length(Parm)] = '"') orπ (Parm[Length(Parm)] = '''') or (X = ParamCount);π Inc(Count);π endπ else if (Temp[1] <> '/') and (Temp[1] <> '-')π then beginπ Inc(Count);π Parm:= Temp;π end;π until (X = ParamCount) or (Count = N);π If Count = N then Parameter:= Parmπ else Parameter:= '';π End;πππ Function Parameters : Byte;π Varπ X : Byte;π Beginπ X:= 0;π If ParamCount > 0 then beginπ Repeatπ Inc(X)π Until Parameter(X) = '';π Parameters:= X - 1;π endπ else Parameters:= 0;π End;ππ{π For example, the command line:ππTESTPRG /C INPUT.DAT /X67 "first one"ππ Parameters returns 2π Parameter(1) returns INPUT.DATπ Parameter(2) returns "first oneπ SwitchThere('F') returns falseπ SwitchData('X') returns 67ππ Notice that in quoted parameters, the first quote is returned- this allowsπyou to check for " vs. ', which you could use as the difference between caseπsensitive and non-case-sensitive. A simple Delete(S,1,1) can remove it fromπthe string for use. }π